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ABSTRACT 

Paraphrase, as it reflects the processes of 
remembering rather than those of forgetting, implies that language is 
best transmitted in one form and stored in another. The dual 
representation of linguistic information that is implied by 
paraphrase is important for storing information that has been 
received and for transmitting information that has been stored. Such 
duality implies a process of recoding that is somehow constrained by 
a grammar. Grammar is seen as a set of complex codes that relates 
transmitted sound and stored meaning. This paper considers the 
construction characteristics of the speech code which unites the 
acoustic signal for transmission and the phonetic representation 
appropriate for storage in short-term memory. The speech code in 
terms of memory research is then considered with speculation on the 
construction of the memory code. (Author/VM) 
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Language Codes and Memory Codes 

Alvin M. Liberman,* Ignatius G. Mattingly,** and Michael T. Turvey 
Haskins Laboratories, New Haven 

INTRODUCTION; PARAPHRASE, GRAMMATICAL CODES. AND MEMORY 

When people recall linguistic information, they commonly produce utter- 
ances different in form from those originally presented. Except in special 
cases where the information does not exceed the immediate memory span, or 
where rote memory is for some reason required, recall is always a paraphrase. 

There are at least two ways in which we can look at paraphrase in memo- 
ry for linguistic material and linguistic episodes. We can view paraphrase 
as indicating the considerable degree to which detail is forgotten! best, 
what is retained are several choice words with a certain syntactic structure, 
which, together, serve to guide and constrain subsequent attempts to recon- 
struct the original form of the information. On this view, rote recall is 
the ideal, and paraphrase is so much error. Alternatively, we can view the 
paraphrase not as an index of what has been forgotten but rather as an essen- 
tial condition or correlate of the processes by which we normally remember. 

On this view, rote rechll is not the ideal, and paraphrase is something other 
than failure to recall. It is evident that any large amount of linguistic 
information is not, and cannot be, stored in the form in which it was pre- 
sented. Indeed, if it were, then we should probably have run out of memory 
space at a very early age. 

We may choose, then, between two views of paraphrases the first would 
say that the form of the information undergoes change because of forgetting; 
the second, that the processes of remembering make such change all but inevi- 
table. In this paper we have adopted the second view, that paraphrase re- 
flects the processes of remembering rather than those of forgetting. Putting 
this view another way, we should say that the ubiquitous fact of paraphrase 
implies that language is best transmitted in one form and stored in another. 

The dual representation of linguistic information that is implied by 
paraphrase is important, then, if we are to store information that has been 
received and to transmit information that has been stored. We take it that 
such duality implies, in turn, a process of recoding that is somehow 
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constrained by a grammar. Thus, the capacity for paraphrase reflects the 
fundamental grammatical characteristics of language. We should say, there- 
fore, that efficient memory for linguistic information depends, to a consid- 
erable extent, on grammar. 



To illustrate this point of view, we might imagine languages that lack 
a significant number of the grammatical devices that all natural languages 
have? We should suppose that the possibilities for recoding and paraphrase 
would, as a consequence, be limited, and that the users of such languages 
would not remember linguistic information very well. Pidgins appear to be 
grammatically impoverished and, indeed, to permit little paraphrase, but 
unfortunately for our purposes, speakers of pidgins also speak some natural 
language, so they can convert back and forth between the natural language 
and the pidgin. Sign language of the deaf, on the other hand, might conceiv 
ably provide an interesting test. At the present time we know very little 
about the grammatical characteristics of sign language, but it may prove to 
have recoding (and hence paraphrase) possibilities that are, by comparison 
with natural languages, somewhat restricted. 1 If so, one could indeed ope 
to determine the effects of such restriction on the ability to remember. 



In natural languages we cannot explore in that controlled way the 
causes and consequences of paraphrase, since all such languages must be as- 
sumed to be very similar in degree of grammatical complexity. Let us, there- 
fore, learn what we can by looking at the several levels or representations 
of information that we normally find in language and at the grammatical com- 
ponents that convert between them. 

At the one extreme is the acoustic level, where the information is in a 
form appropriate for transmission. As we shall see, this acoustic represen- 
tation is not the whole sound as such but rather a pattern of specifiable 
events, the acoustic cues. By a complexly encoded connection, the acoustic 
cues reflect the "features" that characterize the articulatory gestures and 
so the phonetically distinct configurations of the vocal tract. These latter 
are a full level removed from the sound in the structure of language; when 
properly combined, they are roughly equivalent to the segments of the phonetic 

representation . 

Only some fifteen or twenty features are needed to describe the phonetics 
of all human languages (Chomsky and Halle, 1968). Any particular language 
uses only a dozen or so features from the total ensemble, and at any particu- 
lar moment in the stream of speech only six or eight features are likely to be 
significant. The small number of features and the complex relation between 
sound and feature reflect the properties of the vocal tract and the ear and 
also, as we will show, the mismatch between these organ systems and the re- 
quirements of the phonetic message. 

At the other end of the linguistic structure is the semantic representa- 
tion in which the information is ultimately stored. Because of its relative 
inaccessibility, we cannot speak with confidence about the shape of the 



1 



The possibilities for paraphrase in sign language are, in fact, 
tigated by Edward Klima and Ursula Bellugi. 
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ill 1 ** 1° exoerienceareto be represented, then the available inventory of 
semantic features must be very large, much larger surely than „°”partic- 

phonetic features that will be used as the ultimate vehicles. Tho ugh Partic 
ular semantic sets may comprise many features, it is ' onc ^^ a “®^“ rl 3 tlc8 
structure of a set might be quite simple. At all a, rents, 
nf t- V ia s ema ntic representation can be assumed to reflect prop 
el™ rnernZ lush es the very different characteristics of the acoustic and 
phonetic reprisentations reflect the properties of components most direct y 
concerned with transmission. 

The gap between the acoustic and semantic levels is bridged by grammar. 

wi?h a viw’of°Lnguage like the one developed by the generative grammarians 
(see ChoX° 1965). 8 <>n that view there are three 

surface structure, and phonetic representation distinction 

acoustic and semantic-we have already talked about. As in th « d * s “™“° n 
between acoustic and semantic levels, the information at every e e 

different structure. At the level of deep strut tore, for examp^^e.a^ string 

becomes 

ZTTZ III La“? ^i.^iTJuror*^^ 1 =^ U 
to surflce structure, by syntactic rules! then to phonet c representat ion by 2 

larged units^ ^ in tJrirganiaation of words into phrases and -tences^or^ 
^To^rST: «S°e restructuring of^he info^tlon in which the num- 

sr - 7^1 if £ 

as^recodings t and e to r speak h of C the e gram^tical as codes, 

bly occur^oa^freel^in^the^yntactic and°sementic n cod^?^Bur°the spee^^~ 

r^e Mh ft Tit ^ ^1^“:%^ r.i d ."t £ Ul :JUnent 



In generative grammar, as in all others, the conversi . . As 

representation and acoustic signal is not presumed to be g u pcome ’ apparent 
we have argued elsewhere, however, and as will to some exte forma ? 

in this paper, this conversion is a complex recoding, similar in formal 
characteristics to the recodings of syntax and phonology (Mattingly and 
Liberman, 1969; Liberman, 1970). 
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of the process that makes possible the more obvious forms of paraphrase , 
as well as the efficient memory which they always accompany. 

Grammar is, then, a set of complex codes that relates transmitted sound 
and stored meaning. It also suggests what it is that the recoding processes 
must somehow accomplish. Looking at these processes from the speaker’s view- 
point, we see, for example, that the semantic features must be replaced by 
phonological features in preparation for transmission. In this conversion 
an utterance which is, at the semantic level, a single unit comprising many 
features of meaning becomes, phonologically, a number of units composed of 
a very few features, the phonologic units and features being in themselves 
meaningless. Again, the semantic representation of an utterance in coherent 
discourse will typically contain multiple references to the same topic. 

This amounts to a kind of redundancy which serves, perhaps, to protect the 
semantic representation from noise in long-term memory. In the acoustic rep- 
resentation, however, to preserve such repetitions would unduly prolong dis- 
course. To take again the example we used earlier , we do not say The man 
sings. The man married the girl . The girl is pretty , but rather The man 
who sings married the pretty girl . The syntactic rules describe the ways in 
which such redundant references are deleted. At the acoustic and phonetic 
levels, redundancy of a very different kind may be desirable. Given the 
long strings of empty elements that exist there, the rules of the phonologic 
component predict certain lawful phonetic patterns in particular contexts 
and, by this kind of redundancy, help to keep the phonetic events in their 
proper order. 

But our present knowledge of the grammar does not provide much more than 
a general framework within which to think about the problem of recoding in 
memory. It does not, for example, deal directly with the central problem of 
paraphrase. If a speaker-hearer has gone from sound to meaning by some set 
of grammatical rules, what is to prevent his going in the opposite direction 
by the inverse operations, thus producing a rote rendition of the originally 
presented information? In this connection we should say on behalf of the 
grammar that it is not an algorithm for automatically recoding in one direc- 
tion or the other, but rather a description of the relationships that must 
hold between the semantic representation, at the one end, and the correspond- 
ing acoustic representation at the other. To account for paraphrase, we must 
suppose that the speaker synthesizes the acoustic representation, given the 
corresponding semantic representation, while the listener must synthesize an 
approximately equivalent semantic representation, given the corresponding 
acoustic representation. Because the grammar only constrains these acts of 
synthesis in very general ways, there i3 considerable freedom in the actual 
process of recoding; we assume that such freedom is essential if linguistic 
information is to be well remembered. 

For students of memory, grammatical codes are unsatisfactory in yet an- 
other, if closely related, respect: though they may account for an otherwise 

arbitrary-appearing relation between streams of information at different 
levels of the linguistic structure, they do not describe the actual processes 
by which the human being recodes from the one level to the other, nor does 
/{ the grammarian intend that they should. Indeed, it is an open question wheth 

/ er even the levels that the grammar assumes — for example, deep structure — 

!. have counterparts of some kind in the recoding process. 



62 

O 

ERIC 



4 



w*» mieht do well, then, to concentrate our attention on just one aspect 
f crammar the speech code that relates the acoustic and phonetic represen- 
tSo^c^e thin avoid some of the difficuUiesweencounUr o 
the "higher" or "deeper" reaches of the language. The acoustic and P"“" et J c 
levels have been accessible to psychological (and physiological) e^eri e^ , 
ss a result of which we are able to talk about "real" processes and real 
levels yet the conversion we find there resembles grammatical codes more 
generally and can be shown, in a functional as well as a formal sense, to be 

an integral part of language. We will, therefore, some 

the characteristics of the speech code, having in mind that it reflects s 
of 6 the* important characteristics of the broader class of language codes and 
that it mav therefore, serve well as a basis for comparison with the memory 
ctdes we are supposed to be concerned with. It is the more appropriate that 
“e thouH dtal "th the speech code because it co^isas the conversion fr» 
an acoustic signal appropriate for transmission to a o£ 

appropriate for storage in short-term memory, a process that is Itself 
some interest to members of this conference. 

chapap.tkristiCS OF THE SPEECH CODE 



Clarity of the Signal 

It is an interesting and important fact about the speech code that the 

physical signal is a poor one. We can see that this is so by *° in Figure' 1. 
spectrographic representation of the speech signal like the one ? 

?his is a picture of the phrase "to catch pink salmon." As always in a 
SDectroeram, frequency is on the vertical axis, time on the h ° r ** onta l’ 1 

tive intensity is represented by the density, or bla ^J ea ®^ ° fch ® “o-called 
The relatively darker bands are resonances of the vocal tract, the 
formants Se know that the lowest two or three of these formants contain 
almost all of the linguistic information; yet, as we can see, the acoustic 
energy is not narrowly concentrated there but tends rather to be smeared 
across the spect™; moreover, there is at least one higher formant at about 
3600 cps thS never varies and thus carries no linguistic Information at all 
This is to say that the linguistically important cues const it utearela tive y 
small part of the total physical energy. To appreciate o ^at extent this 
is so we might contrast speech with the printed alphabe , , . 

tant parts of the signal stand out clearly from the tackgroun^. ^ e 

ic"°spectrogram Kf“e1»X£ 2, wLch £ odu«s intelligible speech 
though the formants are unnaturally narrow and sharply defined. 

In fact, the speech signal is worse than we have so f^saldorthan we 
can immediately see just by looking at a spectrogram, 

the formants are most indeterminate at precisely those “ 

formation they carry is most important. It is, we know, ^ ^ . pnn _ 

in the frequency position of the formants (the *°"" an ^" S < ^ a ° n of the stop 
f j n t-Vko ^nsantial cues for most of the consonants. In t , 

^sonants? the e ctanges occur in 50 msec or less, and they aometimes extend 

over°ranges as great a! 600 cps. Such signals scatter ^ 
fore difficult to specify or to track. Moreover, the difficulty 1 » 

at the point where they begin, though that is the most ^d. 

transition for the listener who wants to know the phonetic ident y 
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The physical indeterminacy of the signal is an interesting aspect of 
the speech code because it implies a need for processors specialized for 
the purpose of extracting the essential acoustic parameters. The output of 
these processors might be a cleaned-up description of the signal, not unlike 
the simplified synthetic spectrogram of Figure 2. But such an output, it is 
important to understand, would be auditory, not phonetic. The signal would 
only have been clarified} it would not have been decoded. 

Complexity of the Code 

Like the other parts of the grammatical code, the conversion from speech 
sound to phonetic message is complex. Invoking a distinction we have previ- 
ously found useful in this connection, we should say that the conversion is 
truly a code and not a cipher (Liberman, Cooper, Shankweiler, and Studdert- 
Kennedy, 1967} Studdert-Kennedy, in press). If the sounds of speech were a 
• simple cipher, there would be a unit sound for each phonetic segment. Some- 

? thing approximating such a cipher does indeed exist in one of the written 

■ forms of language — viz., alphabets — where each phonological segment is rep- 

resented by a discrete optical shape. But speech is not an alphabet or 
cipher in that sense. In the inter conversion between acoustic signal and 
phonetic message the information is radically restructured so that successive 
segments of the message are carried simultaneously—that i3, in parallel— on 
exactly the same parts of the acoustic signal. As a result, the segmentation 
of the signal does not correspond to the segmentation of the message; and the 
part of the acoustic signal that carries information about a particular pho- 
netic segment varies drastically in shape according to context. 

In Figure 3 we see schematic spectrograms that produce the syllables 
[di] and [du] and illustrate several aspects of the speech code. To synthe- 
size the vowels [i] and [u], at least in slow articulation, we need only the 
steady-state formants— that is, the parts of the pattern to the right of the 
formant transitions. These acoustic segments correspond in simple fashion 
to the perceived phonetic segments: they provide sufficient cues for the 

vowels; they carry information about no other segments; and though the fact 
is not illustrated here, they are, in slow articulation, the same in all mes- 
sage contexts. For the slowly articulated vowels, then, the relation between 
sound and message is a simple cipher. The stop consonants, on the other hand, 
are complexly encoded, even in slow articulation. To see in what sense this 
is so, we should examine the formant transitions, the rapid changes in formant 
frequency at the beginning (left) of the pattern. Transitions of the first 
(lower) formant are cues for manner and voicing; in this case they tell the 
listener that the consonants are members of the class of voiced stops [bdg]. 
For our present purposes, the transitions of the second (higher) formant— the 
parts of the pattern enclosed in the broken circles— are of greater interest. 
Such transitions are, in general, cues for the perceived "place distinctions 



^Alphabets commonly make contact with the language at a level somewhat more 
abstract than the phonetic. Thus, in English the letters often represent 
what some linguists would call morphophonemes , as for example in the use 
of "s" for what is phonetically the [s] of cats and the [z] of dogs . In 
the terminology of generative grammar, the level so represented corresponds 
roughly to the phonological. 
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Schematic Spectrogram for the Syllables [di] and [du] 
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among the consonants. In the patterns of Figure 3 they tell the listener that 
the stop is [d] in both cases. Plainly, the transition cues for [d] are 
very different in the two vowel contexts: the one with [i] is a rising 

transition relatively high in the spectrum, the one with [u] a falling tran- 
sition low in the spectrum. It is less obvious, perhaps, but equally true 
that there is no isolable acoustic segment corresponding to the message seg- 
ment [d ] : at every instant, the second-formant transition carries informa- 

tion about both the consonant and the vowel. This kind of parallel trans- 
mission reflects the fact that the consonant is truly encoded into the vowel; 
this is, we would emphasize, the central characteristic of the speech code. 

The next figure (Figure 4) shows more clearly than the last the more 
complex kind of parallel transmission that frequently occurs in speech. If 
converted to sound, the schematic spectrogram shown there is sufficient to 
produce an approximation to the syllable [b«g]. The point of the figure is 
to show where information about the phonetic segments is to be found in the 
acoustic signal. Limiting our attention again to the second formant, we see 
that information about the vowel extends from the beginning of the utterance 
to the end. This is so because a change in the vowel — from [b*g] to [big], 
for example — will require a change in the entire formant, not merely some- 
where in its middle section. Information about the first consonant, [b], 
extends through the first two-thirds of the whole temporal extent of the for- 
mant. This can be established by showing that a change in the first segment 
of the message — from [baeg ] to [g*g] , for example — will require a change in 
the signal from the beginning of the sound to the point, approximately two- 
thirds of the way along the formant, that we see marked in the figure. A 
similar statement and similar test apply also to the last consonant, [g]. 

In general, every part of the second formant carries information about at 
least two segments of the message; and there is a part of that formant, in 
the middle, into which all three message segments have been simultaneously 
encoded. We see, perhaps more easily than in Figure 1, that the lack of cor- 
respondence in segmentation is not trivial. It is not the case that there 
are simple extensions connecting an otherwise segmented signal, as in the 
case of cursive writing, or that there are regions of acoustic overlap sepa- 
rating acoustic sections that at some point correspond to the segments of the 
message. There is no correspondence in segmentation because several segments 
of the message have been, in a very strict sense, encoded into the same seg- 
ment of the signal. 



Transparency of the Code 



We have just seen that not all phonetic segments are necessarily encoded 
in the speech signal to the same degree. In even the slowest articulations, 
all of the consonants, except the fricatives,^ are encoded. But the vowels 
(and the fricatives) can be, and sometimes are, represented in the acoustic 
signal quite straightforwardly, one acoustic segment for each phonetic seg- 
ment. It is as if there were in the speech stream occasionally transparent 
stretches. We might expect that these stretches, in which the phonetic ele- 
ments are not restructured in the sound, could be treated as if they were a 



For a fuller discussion of this point, see Liberman, Cooper, Shankweiler, 
and Studdert-Kennedy , 1967. 
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Schematic Spectrogram Showing 



Effects of Coarticulation in the Syllable [bag] 






cipher. There is, thus, a kind of intermittency In the difficulty of decod- 
ing the acoustic signal. We may wonder whether that characteristic of the 
speech code serves a significant purpose— such as providing the decoding 
machinery with frequent opportunities to get back on the track when and if 
things go wrong — but it is, in any case, an important characteristic to note, 
as we will see later in the paper, because of the correspondence between 
what we might call degree of encoding and evidence for special processing. 

Lawfulness of the Code 



Given an encoded relation between two streams or levels of information 
such as we described in the preceding section, we should ask whether the con- 
version from the one to the other is made lawfully — that is, by the applica- 
tion of rules — or, alternatively, in some purely arbitrary way. To say that 
the conversion is by rule is to say that it can be rationalized, that there 
is, in linguistic terms, a grammar. If the connection is arbitrary, then 
there is, in effect, a code book; to decode a signal, one looks it up in the 
book. 

I 

The speech code is, as we will see, not arbitrary, yet it might appear 
so to an intelligent but inarticulate cryptanalyst from Mars. Suppose that 
such a creature, knowing nothing about speech, were given many samples of 
utterances (in acoustic or visible form), each paired with its decoded or 
plain— text phonetic equivalents. Let us suppose further, as seems to us 
quite reasonable, that he would finally conclude that the code could not be 
rationalized, that it could only be dealt with by reference to a code book. 
Such a conclusion would, of course, be uninteresting. From the point of 
view of one who knows that human beings readily decode spoken utterances, 
the code-book solution would also seem implausible * since the number of en- 
tries in the book would have to be so very large. Having in mind the example 
of [bag] that we developed earlier, we see that the number of entries would, 
at the least, be as great as the number of syllables. But, in fact, the num- 
ber would be very much larger than that, because coding influences sometimes 
extend across syllable boundaries (Ohman, 1966) and because the acoustic 
shape of the signal changes drastically with such factors as rate of speaking 
and phonetic stress (Lindblom, 1963; Lisker and Abramson, 1967). 

At all events, our Martian would surely have concluded, to the contrary, 
that the speech code was lawful if anyone had described for him, even in the 
most general terms, the processes by which the sounds are produced. Taking 
the syllable [b*g], which we illustrated earlier, as our example, one might 
have offered a description about as follows. The phonetic segments of the 
syllable are taken apart into their constituent features, such as place of 
production, manner of production, condition of voicing, etc. These features 
are represented, we must suppose, as neural signals that will become, ulti- 
mately, the commands to the muscles of articulation. Before they become the 
final commands, however, the neural signals are organized so as to produce 
the greatest possible overlap in activity of the independent muscles to which 
the separate features are assigned. There may also occur at this stage some 
reorganization of the commands so as to insure cooperative activity of the 
several muscle groups, especially when they all act on. the same organ, as is 
the case with the muscle groups that control the gestures of the tongue. But 
so far the features, or rather their neural equivalents, have only been 
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organized; they can still be found as largely independent entities, which is 
to say chat they have not yet been thoroughly encoded. In the next stage 
the neural commands (in the final common paths) cause muscular contraction, 
but this conversion is, from our standpoint, straightforward and need not de- 
tain us. It is in the final conversions, from muscle contraction to vocal** 
tract shape to sound, that the output is radically restructured and that true 
encoding occurs. For it is there that the independent but overlapping activi- 
ty of independent muscle groups becomes merged as they are reflected in the 
acoustic signal. In the case of [baeg] , the movement of the lips that repre- 
sents a feature of the initial consonant is overlapped with the shaping of 
the tongue appropriate for the next vowel segment. In the conversion to 
sound, the number of dimensions is reduced, with the result that the simul- 
taneous activity of lips and tongue affect exactly the same parameter of the' 
acoustic signal, for example, the second formant. We, and our Martian, see 
then how it is that the consonant and the vowel are encoded. 

The foregoing account is intended merely to show that a very crude model 
can, in general, account for the complexly encoded relation between the speech 
signal and the phonetic message. That model rationalizes the relation between 
these two levels of the language, much as the linguists' syntactic model 
rationalizes the relation between deep and surface structure. For that rea- 
son, and because of certain formal similarities we have described elsewhere 
(Mattingly and Liberman, 1969), we should say of our speech model that it is, 
like syntax, a grammar. It differs from syntax in that the grammar of speech 
is a model of a f lesh-and-blood process, not, as in the case of syntax, a set 
of rules with no describable physiological correlates. Because the grammar 
of speech corresponds to an actual process, we are led to believe that it is 
important, not just to the scientist who would understand the code but also 
to the ordinary listener who needs that same kind of understanding, albeit 
tacitly, if he is to perform appropriately the complex task of perceiving 
speech. We assume that the listener decodes the speech signal by reference 
to the grammar, that is, by reference to a general model of the articulatory 
process. This assumption has been called the motor theory of speech perception. 

Efficiency of the Code 



The complexity of the speech code is not a fluke of nature that man has 
somehow got to cope with but is rather an essential condition for the effi- 
ciency of speech, both in production and in perception, serving as a necessary 
link between an acoustic representation appropriate for transmission and a 
phonetic representation appropriate for storage in short-term memory. Con- 
sider production first. As we have already had occasion to say, the constit- 
uent features of the phonetic segments are assigned to more or less independ- 
ent sets of articulators, whose activity is then overlapped to a very great 
extent. In the most extreme case, all the muscle movements required to com- 
municate the entire syllable would occur simultaneously; in the more usual 
case, the activity corresponding to the several features is broadly smeared 
through the syllable. In either case the result is that phonetic segments 
are realized in articulation at rates higher than the rate at which any single 
muscle can change its state. The coarticulation that characterizes so much 
of speech production and causes the complications of the speech code seems 
well designed to permit relatively slow-moving muscles to transmit phonetic 
segments at high rates (Cooper, 1966). 
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The efficiency of the code on the side of perception is equally clear. 
Consider, first, that the temporal resolving power of the ear must set an 
upper limit on the rate at which we can perceive successive acoustic events. 
Beyond that limit the successive sounds merge into a buzz and become uniden- 
tifiable. If speech were a cipher on the phonetic message—that is, if each 
segment of the message were represented by a unit sound-then the limit would 
be determined directly by the rate, at which the phonetic segments were trans- 
mitted. But given that the message segments are, in fact, encoded into a- 
coustic segments of roughly syllabic size, the limit is set not by the number 
of phonetic segments per unit time but by the number of syllables. This rep- 
resents a considerable gain in the rate at which message segments can be per- 
ceived. 

The efficient encoding described above results from a kind of parallel 
transmission in which information about successive segments is transmitted 
simultaneously on the same part of the signal. We should note that there is 
another, very different kind of parallel transmission in speech: cues for 

the features of the same segment are carried simultaneously on different 
parts of the signal. Recalling the patterns of Figure 4, we note that the 
cues for place of production are in the second-formant transition, while the 
first-formant transition carries the cues for manner and voicing. This is 
an apparently less complicated arrangement than the parallel transmission 
produced by the encoding of the consonant into the vowel, because it takes 
advantage of the ear's ability to resolve two very different frequency levels. 
We should point out, however, that the listener is not at all aware of the 
two frequency levels, as he is in listening to a chord that is made up of two 
pitches, but rather hears the stop, with all its features, in a unitary way. 

The speech code is apparently designed to increase efficiency in yet 
another aspect of speech perception: it makes possible a considerable gain 

in our ability to identify the order in which the message segments occur. 
Recent research by Warren et al. (1969) has shown that the sequential order 
of nonspeech signals can be correctly identified only when these segments 
have durations several times greater than the average that must be assigned 
to the message segments in speech. If speech were a cipher — that is, if 
there were an invariant sound for each unit of the message — then it would 
have to be transmitted at relatively low rates if we were to know that the 
word "task," for example, was not "taks" or "sakt" or "kats." But in the 
speech code, the order of the segments is not necessarily signalled, as we 
might suppose, by the temporal order in which the acoustic cues occur. Re- 
calling what we said earlier about the context-conditioned variation in the 
cues, we should note now that each acoustic cue is clearly marked by these 
variations for the position of the signalled segment in the message. In the 
case of the transition cues for [d] that we described earlier, for example, 
we should find that in initial and final positions— for example, in [dzg] and 
[gad] the cues were mirror Images. In listening to speech we somehow hear 
through the context-conditioned variation in order to arrive at the canonical 
form of the segment, in this case [d]. But we might guess that we also use 
the context— determined shape of the cue to decide where in the sequence the 
signalled segment occurred. In any case, the order of the segments we hear 
may be to a large extent inferred — quite exactly synthesized, created, or con- 
structed — from cues in a way that has little or nothing to do with the order 
of their occurrence in time. Given what appears to be a relatively poor 



ability to identify the order of acoustic events from temporal cues, this 
aspect of the speech code would significantly increase the rate at which we 
can accurately perceive the message. 

The speech code is efficient, too, in that it converts between a high- 
information-cost acoustic signal appropriate for transmission and a low- 
information-cost phonetic string appropriate for storage in someshortte 
memory. Indeed, the difference in information rate between the two levels 
of the speech code is staggering. To transmit the signal in acoustic form 
and in high fidelity costs about 70,000 bits per second; for reasonable in 
telligibility we need about 40,000 bits per second. Assuming a frequency 
volley theory of hearing through most of the speech r ange, we shou 
that a great deal of nervous tissue would have to be devoted to the sto g 
of even ^e^tively short stretches. But recoding into a phonetic represen- 
tation, we reduce the cost to less than 40 bits per second, thus a 

saving of about 1,000 times by comparison with the acoustic form an 
roughly half that by comparison with what we might assume a reduced auditory 
(but not phonetic ) Representation to be. We must emphasize, however, that 
this large saving 'is Realized only if each phonetic feature is represented 
bv a unitary pattern of nervous activity, one such pattern for each feature, 
with no additional or extraneous "auditory" information clinging to t ® e ® es * 
As we will see in the next section, the highly encoded aspects of speech do 
tend to become highly digitized in that sense. 

Naturalness of the Code 

It is testimony to the naturalness of the speech code 
of our species acquire it readily and use it with ease. While y 

true that a child reared in total isolation would not produce phonetically 
intelligible speech. It is equally true that in normal ci ' cu ?J“ n “ 8 a h * 
to do tLt without formal tuition. .Indeed, given a normalchiidinanormal 
environment, it would be difficult to contrive methods that would effectively 
prevent him from acquiring speech. 

It is also relevant that, as we pointed out earlier, there is a univer 
sal phonetics. A relatively few phonetic features suffice, given the various 
combinations into which they are entered, to account for most of the phon 
segments, and in particular those that carry the heaviest information load, 
in^he languages of the world. For example, stops and vowels, the segments 
with which we have been exclusively concerned in this paper, are uai ^ er ®* » 

Is is thR cR-articulated consonant-vowel syllable that we have used to illus- 
trate the speech code. Such phonetic universals are the more ^resting be 
cause they often require precise control of articulation; hence y 
to be dismissed with the airy observation that since all men have im 
vocal tracts, they can be expected to make similar noi3es. 

Because the speech code is complex but easy, we should suppose that man 
has access to special devices for encoding and decoding it. There is n 
great deal of evidence that such specialized processors do exist in nan, 
apparently by virtue of his membership in the race. As a consequence, speech 
requires no conscious or special effort; the speech code is well matched to 
man and is, in precisely that sense, natural. 
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f existence of special speech processors is strongly suggested by the 

act that the encoded sounds of speech are perceived in a special mode. It 
is obvious— indeed so obvious that everyone takes it for granted — that we do 
not and cannot heat the encoded parts of the speech signal In audited teras. 

dltorv i-h. Se8 r”< the ° yllables t ba l' tdaj , [ga] have no Identifiable au- 
ditory characteristics ; they are unique linguistic events. It is as if they 

th^ frnm b ^ raCt ° Ut ? Ut ° f 3 deV±Ce s P eclalized to extract them, and only y 
them, from the acoustic signal. This abstract nonauditory perception is 

tteseeonA ° ° f | . encode J 8 P eech « not of a class of acoustic events such as 
r _ al V d ^ ormaat transltlons that are sufficient to distinguish [ba], [da], 
[ga], for when these transition cues are extracted from synthetic speech 

XAAA pre8eated aloae ‘ they sound just like the "chirps" or glissandi 
tOT l Psychophysics would lead us to expect. Nor is this abstract 
perception characteristic of the relatively unencoded parts of the speech 

cafbe H A 8tead r 8tate nol8es ° f th. fricatives, [s] and [f]! for^ple, 

is hioh A n ° isas ’ mor eover, one can easily judge that the noise of [s] 

is higher in pitch than the noise of [j], 1 

fl ni t A?! r ,° 1 1 1 ? r \ CharaCt ! rl8tlc of this klnd of abstract perception, measured 
I f i lly by V va J iaty of techniques, is one that has been called 
iQ 7 n ? ±Cal Pctception (see Studdert-Kennedy, Liberman, Harris, and Cooper, 
1970, for a review; Haggard, 1970, 1971b; Pisoni, 1971; Vinegrad, 1970). In 

cat 6n± ? 8 t0 the encoded 8e 8 men ts of speech we tend to hear them only as 

divSed intn^Lf 8 3 Pe !u? iVed continuum that can be more or less arbitrarily 
, vlded ln to regions. This occurs even when, with synthetic speech, we pro- 

duce stimuli that lie at intermediate points along the acoustic continuum 

lAnnrnAA A < cu ? 8 ‘ In lts extreme form, which is rather close- 

sLnsM ^ d C38e ° f the 8top8 ‘ categorical perception creates a 

l£t!n^ n ’ Ve ^ f f f ent fr ° m the usual P^chophysical case, in which the 

Absolute" “ 8tlmUl1 aS bett “ he «» Wentlfy 

cate8 ° rlcal Perception of the stops is not simply a character- 
thAA If® W3y We Pr f e8S 3 certaln class of acoustic stimuli-in this case 
f ^? quency “oblation that constitutes the (second-formant transi- 

Svrdal anH t « C i CUe ~^?^ een T 8hOWn ® recent stud y (Mattingly, Liberman, 
X**’ Halwe8 ‘ 1971 >‘ Ifc was found there that, when listened to in iso- 
lation, the second-formant transitions— the chirps we referred to earlier- 
are not perceived categorically. 

Nor can it be said that categorical perception is simply a consequence 
tn A h* 1 f ncy attach phonetic labels to the elements of speech and then 

expect to fA t ele ? en ? 8 SOUnded 1±ke * If that were the case ‘ we 8hould 
- e fLl Cate8 ° r±Cal P er ception of the unencoded steady-state vowels, 

but in fact, we do not-certainly not to the same extent (Fry, Abramson, 

Eimas, and Liberman, 1962; Eimas, 1963; Stevens, Liberman, Ohman, and 
Studdert-Kennedjr, 1969; Pisoni, 1971; Fujisaki and Kawashima, 1969). More- 

to be refr 8 ^ 1 ^ 1 -!^ 06 ?^ 011 ° f th ® encoded segments has recently been found 
lected within 100 msec in cortical evoked potentials (Dorman, 1971), 

In the case of the encoded stops, then, it appears that the listener has 

sne a I?A 0r r 1,na8e ° f the S±8nal availab le to him, but only the output of a 
specialized processor that has stripped the signal of all normal sensory 
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information and represented each phonetic segment (or feature) categorically 
by a unitary neural event. Such unitary neural representations would pre- 
sumably be easy to store and also to combine, permute, and otherwise shuffle 
around in the further processing that converts be. “ween sound and meaning. 

But perception of vowels is, as we noted, not so nearly categorical. 

The listener discriminates many more stimuli than he can absolutely identify, 
just as he does with nonspeech; accordingly, we should suppose that, as with 
nonspeech, he hears the signal in auditory terms. Such an auditory image 
would be important in the perception of the pitch and duration cues that fig- 
ure in the prosodic aspects of speech; moreover, it would be essential that 
the auditory image be held for some seconds, since the listener must often 
wait to the end of a phrase or sentence in order to know what linguistic 
value to assign to the particular pitch and duration cues he heard earlier. 

Finally, we should note about categorical perception that, according to 
a recent study (Elmas, Siqueland, Jusczyk, and Vigorito, 1971), it is present 
in infants at the age of four weeks. These infants discriminated synthetic 
[ba] and [pa]; moreover, and more significantly, they discriminated better, 
other things being equal, between pairs of stimuli which straddled the adult 
phonetic boundary than between pairs which lay entirely within the phonetic 
category. In other words, the infants perceived the voicing feature cate- 
gorically. From this we should conclude that the voicing feature is real, 
not only physiologically but in a very natural sense. 

Other, perhaps more direct, evidence for the existence of specialized 
speech processors comes from a number of recent experiments that overload 
perceptual mechanisms by putting competing signals simultaneously into the 
two ears (Broadbent and Gregory, 1964; Bryden, 1963; Kimura, 1961, 1964, 

1967; Shankweiler and Studdert-Kennedy, 1967; Studdert-Kennedy and Shank- 
weiler, 1970). The general finding with speech signals, including nonsense 
syllables that differ, say, only in the initial consonant, is that stimuli 
presented to the right ear are better heard than those presented to the left; 
with complex ,xion speech sounds the opposite result — a left-ear advantage— is 
found. Since there is reason to believe, especially in the case of competing 
and dichotically presented stimuli, that the contralateral cerebral repre- 
sentation is the stronger, these results have been taken to mean that speech, 
including its purely phonetic aspects, needs to be processed in the left hemi- 
sphere, nonspeech in the right. The fact that phonetic perception goes on in 
a particular part of the brain is surely consistent with the view that it is 
carried out by a special processor. 

The case for a special processor to decode speech is considerably 
strengthened by the finding that the right-ear advantage depends on the en- 
codedness of the signal. For example, stop consonants typically show a larger 
and more consistent right-ear advantage than unencoded vowels (Shankweiler and 
Studdert-Kennedy, 1967; Studdert-Kennedy and Shankweiler, 1970). Other recent 
studies have confirmed that finding and have explored even more analytically 
the conditions of the right-ear (left-hemisphere) advantage for speech (Darwin, 
1969, 1971; Haggard, 1971a; Haggard, Ambler, and Callow, 1969; Haggard and 
Parkinson, 1971; Kirstein and Shankweiler, 1969; Spellacy and Blums te in, 1970). 
The results, which are too numerous and complicated to present here even in 
summary form, tend to support the conclusion that processing is forced into 



74 



the left hemisphere (for most subjects) when phonetic decoding, as contrasted 
with phonetic deciphering or with processing of nonspeech, must be carried out. 

Having referred in the discussion of categorical perception to the evi- 
dence that the phonetic segments (or, rather, their features) may be assumed 
to be represented by unitary neural events, we should here point to an inci- 
dental result of the dichotic experiments that is very relevant to that 
assumption. In three experiments (Halwes, 1969; Studdert-Kennedy and Shank- 
weiler, 1970; Yoder, pers. comm.) it has been found that listeners tend sig- 
nificantly often to extract one feature (e.g. , place of production) from the 
input to one ear and another feature (e.g., voicing) from the other and com- 
bine them to hear a segment that was not presented to either ear. Thus, 
given [ba] to the left ear, say, and [ka] to the right, listeners will, when 
they err, far more often report [pa] (place feature from the left ear, voic- 
ing from the right) or [ga] (place feature from the right ear, voicing from 
the left) than [da] or [ta]. We take this as conclusive evidence that the 
features are singular and unitary in the sense that they are independent of 
the context in which they occur and also that, far from being abstract inven- 
tions of the linguist, they have, in fact, a hard reality in physiological 
and psychological processes. 

The technique of overloading the perceptual machinery by dichotic pres- 
entation has led to the discovery of yet another effect which seems, so far, 
to testify to the existence of a special speech processor (Studdert-Kennedy, 
Shankweiler, and Schulman, 1970). The finding, a kind of backward masking 
that has been called the "lag" effect, is that when syllables contrasting in 
the initial stop consonant are presented dichotically and offset in time, the 
second (or lagging) syllable is more accurately perceived. When such sylla- 
bles are presented monotically, the first (or leading) stimulus has the ad- 
vantage. In the dichotic case, the effect is surely central; in the monotic 
case there is presumably a large peripheral component. At all events, it is 
now known that, as in the case of the right-ear advantage, the lag effect is 
greater for the encoded stops than for the unencoded vowels (Kirstein, 1971; 
Porter, Shankweiler, and Liberman, 1969); it has also been found that highly 
encoded stops show a more consistent effect than the relatively less encoded 
liquids and semi-vowels (Porter, 1971). Also relevant is the finding that 
synthetic stops that differ only in the second-formant transitions show a lag 
effect but that the second-formant transitions alone (that is, the chirps) 
do not (Porter, 1971). Such results support the conclusion that this effect, 
too, may be specific to the special processing of speech. 5 

In sum, there is now a great deal of evidence to support the assertion 
that man has ready access to physiological devices that are specialized for 
the purpose of decoding the speech signal and recovering the phonetic message. 
Those devices make it possible for the human being to deal with the speech 
code easily and without conscious awareness of the process or its complexity. 
The code is thus a natural one. 



One experimental result appears so far not to fit with that conclusion: 
syllables that differed in a linguistically irrelevant pitch contour never- 
theless gave a lag effect (Darwin, in press) . 
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Resistance to Distortion 



Everyone who has ever worked with speech knows that the signal holds up 
well against various kinds of distortion. In the case of sentences, a great 
deal of this resistance depends on syntactic and semantic constraints, which 
are, of course, irrelevant to our concern here. But in the perception of 
nonsense syllables, too, the message often survives attempts to perturb it. 

This is due largely to the presence in the signal of several kinds of redun- 
dancy. One arises from the phonotactic rules of the language: not all se- 

quences of speech sounds are allowable. That constraint is presumably owing, 
though only in part, to limitations having to do with the possibilities of 
co-articulation. In any case, it introduces redundancy and may serve as an 
error-correcting device. The other kind of redundancy arises from the fact 
that most phonetic distinctions are cued by more than one acoustic difference. 
Perception of place of production of the stop consonants, for example, is 
normally determined by transitions of the second formant, by transitions of 
the third formant, and by the frequency position of a burst of noise. Each 
of these cues is more or less sufficient, and they are highly independent of 
each other. If one is wiped out, the others remain. 

There is one other way in which speech resists distortion that may be 
the most interesting of all because it implies for speech a special biologi 
cal status. We refer here to the fact that speech remains intelligible even 
when it is removed about as completely as it can be from its normal, natural- 
istic context. In the synthetic patterns so much used by us and others, we 
can, and often do, play fast and loose with the nature of the vocal-tract 
excitation and with such normally fixed characteristics of the formants as 
their number, bandwidth, and relative intensity. Such departures from the 
norm, resulting in the most extreme cases in highly schematic representa- 
tions, remain intelligible. These patterns are more than mere cartoons, 
since certain specific cues must be retained. As Mattingly (in this Status 
Report) has pointed out, speech might be said in this respect to be like the 
sign stimuli that the ethologist talks about. Quite crude and unnatural 
models such as Tinbergen's (1951) dummy sticklebacks, elicit responses pro- 
vided only that the model preserves the significant characters of the origi- 
nal display. As Manning (1969:39) says, "sign stimuli will usually be in- ^ 
volved where it is important never to miss making a response to the stimulus. 
More generally, sign stimuli are often found when the correct transmission of 
information is crucial for the survival of the individual or the species. 
Speech may have been used in this way by early man. 

How to Tell Speech from Nonspeech 

For anyone who uses the speech code, and especially for the very young 
child who is in the process of acquiring it, it is necessary to distinguish 
the sounds of speech from other acoustic stimuli. How does he do this? The 
easy, and probably wrong, answer is that he listens for certain acoustic 
stigmata that mark the speech signal. One thinks, for example, of the nature 
of the vocal-tract excitation or of certain general characteristics of the 
formants. If the listener could identify speech on the basis of such rela- 
tively fixed markers, he would presumably decide at a low level of the per- 
ceptual system whether a particular signal was speech or not and, on the basis 
of that decision, send it to the appropriate processors. But we saw in the 
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preceding section that speech remains speech even when the signal is reduced 
to an extremely schematic form. We suspect, therefore, that the distinction 
between speech and nonspeech is not made at some early stage on the basis o 
general acoustic characteristics. 

More compelling support for that suspicion is to be found in a recent 
experiment by T. Rand (pers. comm.) To one ear he presented all of the 
first formant, including the transitions, together with the steady-state 
parts of the second and third formants; when presented alone, these patterns 
sound vaguely like [da]. To the other ear, with proper time relationships 
carefully preserved, were presented the 50-msec second-formant and third- 
formant transitions; alone , these sound like the chirps us have re ° 

before. But when these patterns were presented together— that is, dichotic- 
ally — listeners clearly heard [ba], [da] or [ga] (depending on the nature of 
the second-formant and third-formant transitions) in one ear and, simultane- 
ously, nonspeech chirps in the other. Thus, it appears that the same acous- 
tic events — the second-formant or third-formant transitions— can e processe 
simultaneously as speech and nonspeech. We should suppose, then, that t e 
incoming signal goes indiscriminately to speech and nonspeech processors. 

If the speech processors succeed in extracting phonetic features, then the 
signal is speech; if they fail, then the signal is processed only as non- 
speech. We wonder if this is a characteristic of all so-called sign stimuli. 

Security of the Code 

The speech code is available to all members of the human race, but prob- 
ably to no other species. There is now evidence that animals other than man, 
including even his nearest primate relatives, do not produce phonetic strings 
and their encoded acoustic correlates (Lieberman, 1968, 1971; Lieberman, 

Klatt , and Wilson, 1969; Lieberman, Crelin, and Klatt, in press). This is 
due, at least in part, to gross differences in vocal-tract anatomy between 
man and all other animals. (It is clear that speech in man is not simply an 
overlaid function, carried out by peripheral structures that evolved in con- 
nection with other more fundamental biological processes; rather, some im- 
portant characteristics of the human vocal tract must be supposed to have 
developed in evolution specifically in connection with speech.) Presumably, 
animals other than man lack also the mechanisms of neurological control 
necessary for the organization and coordination of the gestures of speech, 
but hard evidence for this is lacking. Unfortunately, we know nothing at all 
afc-sut how animals other than man perceive speech. Presumably, they lack the 
special processor necessary to decode the speech signal. If so, we must sup 
pose that their perception of speech would be different from ours. They 
should not hear categorically, for instance, and they should not hear the 
[di]-[du] patterns of Figure 3 as two-segment syllables which have the first 
segment in common. Thus, we should suppose that animals other than man can 
neither produce nor correctly perceive the speech code. If all our enemies 
were animals other than man, cryptanalysts would have nothing to do--or else 
they might have the excessively difficult task of breaking an animal code for 
which man has no natural key. 

Subcodes 

Our discussion so far has, perhaps, left the impression that there is 
only one speech code. In one sense this is true, for it appears that there 
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is a universal ensemble of phonetic features defined by the eaaor^^ut^he 
oossibilities of the vocal tract and the neural speech processor. But th 
subset if Poetic features which are actually used varies from language tc 
language. Each language thus has its own phonetic subcode. A g^n pho 
netic feature, however, will be articulated and perceived in the same way in 
every language in which it is used. Thus, we should be ^ * w °* 

instance, to find a language in which the perception of place for s J 
not categorical. If, as Eimas's results lead us to 8C {^ task b in 

with an intuitive knowledge of the universal phonetics, P nhonetic sub- 

learning his native language is to identify the features of its phonetic sub- 
code and to forget the others. These unused features cannot be 
lost however since people do learn how to speak and understand more than 
ine language! But there^s some evidence that bilinguals listening to their 
second language do not necessarily use the same speech cues as na ve 
ers of the language do (Haggard, 1971b). 

Secondary Codes 

A speaker-hearer can become aware of certain aspects of the linguistic 
wroce^S particular its phonological and phonetic Ptccesses. The awar - 
ness can then be exploited to develop "secondary codes, which may be th g 
of as additional pseudolinguiatic rules added to those of the * 

simple example is a children’s "secret language," such as Pig Latin, in which 
a rule for metathesis and insertion applies to each word. * e T ,K un- 

that to speak or understand Pig Latin fluently would require 1 ^tive 

conscious knowledge of the linguistic structure of ^lishthatallnati 
sneakers have, but also a conscious awareness of a particular aspect or t 
structure— the phonological segmentation— and a considerable amount of prac 
t t ice! Sere is P evidence, indeed, that speakers of <*£- 

scious awareness of phonological segmentation do not master ic 

nite the triviality of its rules (Savin, in press). The pseudolinguist 
character of Pig Latin explains why even s speaker of English who does not 
££ SflTtin would not mistake it for a natural foreign languageandwhy 
one' 1 continues to feel a sense of artificiality in speaking it long after he 
has mastered the trick. 

Systems of versification are more important kinds of secondary codes. 

For a literate society the function of verse is primarily 1 ° £ 

preliterate societies! verse is a means of transmitting verbal inf ormation of 
cultural importance with a minim tin of paraphrase. The r ,, 

effect an addition to the phonology which requires that recalled material 
not only should preserve the semantic values of the original, but ^ould also 
conform to a specific, rule-determined phonetic pattern. Thus in Lat p 
IZI ™ a line of ve!se is divided into six feet, each of which must have one 
of several patterns of long and short syllables. The requirement to conform 
to this p^t^ iccludes almost all possible renditions “her than the correct 
one and makes memorization easier and recall more accur 
j rules are in general more elaborate than those of Pig ® 

degree of linguistic awareness is necessary to compose verse, this f m ° r ® e ^ om " 
pllx skill ha! thus traditionally been the speciaUzedoccupationof a« ew 
momhpra Of a society, though a passive form of the skill, permitting tne 11 s 
tene^to^istinguish "correct" f«m "incorrect" lines without scanning them 
syllable by syllable, has been possible for a much larger number o p P 
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Writing, like versification, is also a secondary code for transmitting 
verbal information accurately, and the two activities have more in common than 
might at first appear. The reader is given a visually coded representation of 
the message, and this representation, whether ideographic, syllabic, or alpha- 
betic, provides very incomplete information about the linguistic structure and 
semantic content of the message. The skilled reader, however, does not need 
complete information and ordinarily does not even need all of the partial in- 
formation given by the graphic patterns but rather just enough to exclude most 
of the other messages which might fit the context. Being competent in his 
language, knowing the rules of the writing system, and having some degree o 
linguistic awareness, he can reproduce the writer's message in reasonably ait 
ful fashion. (Since the specific awareness required is awareness of phonological 
segmentation, it is not surprising that Savin's group of English speakers who 
cannot learn Pig Latin also have great difficulty in learning to read.) 

The reader's reproduction is not, as a rule, verbatim; he makes small 
deviations which are acceptable paraphrases of the original and overlooks or, 
better, unconsciously corrects misprints. This suggests that reading is an 
active process of construction constrained by the partial information on the 
printed page, just as remembering verse is an active process of construction, 
constrained, though much less narrowly, by the rules of versification. As 
Bartlett (1932) noted for the more general case, the processes of perception 
and recall of verbal material are not essentially different. 

For our purposes, the significant fact about pseudo linguistic secondary 
codes is that, while being less natural than the grammatical codes of language, 
they are nevertheless far from being wholly unnatural. They are more or less 
artificial systems based on those aspectB of natural linguistic activities 
which can most readily be brought to consciousness! the levels of phonology 
and phonetics. All children do not acquire secondary codes maturationally, 
but every society contains some individuals who, if given the opportunity, 
can develop sufficient linguistic awareness to learn them, just as every 
society has its potential dancers, musicians, and mathematicians. 

LANGUAGE. SPEECH, AND RESEARCH ON MEMORY 

What we have said about the speech code may be relevant to research on 
memory in two ways: most directly, because work on memory for linguistic in- 

formation, to which we shall presently turn, naturally includes the speech 
code as one stage of processing; and, rather indirectly, because the charac- 
teristics of the speech code provide an interesting basis for comparison with 
the kinds of code that students of memory, including the members of this con- 
ference, talk about. In this section of the paper we will develop that rel- 
evance, summarizing where necessary the appropriate parts of the earlier dis- 
cussion. 

The Speech Code in Memory Research 

Acoustic, auditory, and phonetic representations . When a psychologist 
deals with memory for language, especially when the information is presented 
as speech sounds, he would do well to distinguish the several different forms 
that the information can take, even while it remains in the domain of speech. 
There is, first, the acoustic form in which the signal is transmitted. This 
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is characterized by a poor signal-to-noise ratio and a very high bit rate. 

The second form, found at an early stage of processing in the nervous system, 
is auditory. This neural representation of the information maps in a re 
atively straightforward way onto the acoustic signal. Of course, the acoustic 
and auditory forms are not identical. In addition to the fact that one is 
mechanical and the other neural, it is surely true that some information has 
been lost in the conversion. Moreover, as we pointed out earlier in the paper, 
it is likely that the signal has been sharpened and clarified in certain 
ways. If so, we should assume that the task was carried out by devices not 
unlike the feature detectors the neurophysiologist and psychologist now n- 
vestigate and that apparently operate in visual perception, as they do in 
hearing, to increase contrast and extract certain components of the P a £tern. 

But we should emphasize that the conversion from acoustic to auditory form, 
even when done by the kind of device we just assumed, does not decode the 
signal, however much it may improve it. The relation of the auditory to . t J® , 
acoustic form remains simple, and the bit rate, though conceivably a good deal 
lower at this neural stage than in the sound itself, is still vary To 

arrive at the phonetic representation, the third form that the information 
takes, requires the specialized decoding processes we talked about earlier 
in the paper. The result of that decoding is a small number of unitary neural 
patterns, corresponding to phonetic features, that combine to make the 
what greater number of patterns that constitute the phonetic segments, arranged 
in their proper order, these segments become the message conveyed by the speech 
code. The phonetic representations are, of course, far more economical in 
terms of bits than the auditory ones. They also appear to have sp ec ial stand 
ing as unitary physiological and biological realities. In general, then* they 
are well suited for storage in some kind of short-term memory until enough 
have accumulated to be recoded once more, with what we must suppose is a 
further gain in economy. 

Even when language is presented orthographically to the 9 " h l* ct *' 
the information seems to be recoded into phonetic form. Oneof the 
cent and also most interesting treatments of this matter is to be found in a 
paper by Conrad (in press). He concludes, on the basis of considerable evid- 
ence! that while it is possible to hold the alphabetic shapes as visual in- 
formation in short-term memory-deaf-mute children seem to Just that-- tha 

information can be stored (and dealt with) more efficiently in phonetic form. 

We suppose that this is so because the representations of the phonetic seg- 
ments are quite naturally available in the nervous system in a way, and in a 
form, that representations of the various alphabetic shapes. are not. Given 
the complexities of the conversion from acoustic or auditory form to phonetic, 
and the advantages for storage of the phonetic segments, we should insist that 
this is an important distinction. 

Storage and transmissio n in man and machine- We have emphasized that in 
spoken language the information must be in one Torn (acoustic) for transmission 
and in a very different form (phonetic or semantic) for storage, and that the 
conversion from the one to the other is a complex recoding. But there is no 
logical requirement that this be so. If all the components of the l™S aa S a 
system had been designed from scratch and with the same end in view, t the com- 
pi ex speech code might have been unnecessary. Suppose the designer had decided 
to make do with a smaller number of empty segments, like the phones we have 
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been talking about* that have to be transmitted in rapid succession. The 
engineer might then have built articulators able to produce such sequences 
simply — alphabetically or by a cipher — and ears that could perceive them. 

Or if he had* for some reason* started with sluggish articulators and an ear 
that could not resolve rapid-fire sequences of discrete acoustic signals* he 
might have used a larger inventory of segments transmitted at a lower rate. 

In either case the information would not have had to be restructured in order 
to make it differentially suitable for transmission and storage; there might 
have been* at most* a trivial conversion by means of a simple cipher. Indeed* 
that is very much the situation when computers "talk" to each other. The fact 
that the human being cannot behave so simply* but must rather use a complex 
code to convert between transmitted sound and stored message* reflects the 
conflicting design features of components that presumably developed separately 
and in connection with different biological functions. As we noted in an 
earlier part of the paper* certain structures* such as the vocal tract* that 
evolved originally in connection with nonlinguistic functions have undergone 
important modifications that are clearly related to speech. But these adap- 
tations apparently go only so far as to make possible the further matching 
of components brought about by devices such as those that underlie the speech 
code. 



It is obvious enough that the ear involved long before speech made its 
appearance* so we are not surprised* when we approach the problem from that 
point of view, to discover that not all of its characteristics are ideally 
suited to the perception of speech. But when we consider speech production 
and find that certain design features do not mesh with the characteristics 
of the ear* we are led to wonder if there are not aspects of the process — in 
particular* those closer to the semantic and cognitive levels — that had inde- 
pendently reached a high state of evolutionary development before the appear- 
ance of language as such and had then to be imposed on the best available com- 
ponents to make a smoothly functioning system. Indeed* Mattingly (this Status 
Report) has explicitly proposed that language has two Sources* an intellect 
capable of semantic representation and a system of "social releasers" consist- 
ing of articulated sounds, and that grammar evolved as an interface between 
these two very different mechanisms. 

In the alphabet* man has invented a transmission vehicle for language 
far simpler than speech--a secondary code* in the sense discussed earlier. 

It is a straightforward cipher on the phonological structure, one optical 
shape for each phonological segment* and has a superb signal-to-noice ratio. 

We should suppose that it is precisely the kind of transmission vehicle that 
an engineer might have devised. That alphabetic representations are, indeed, 
good engineering solutions is shown by the relative ease with which engineers 
have been able to build the so-called optical character readers. However, 
the simple arrangements that are so easy for machines can be hard for human 
beings. Reading comes late in the child's development; it must be taught; 
and many fail to learn. Speech, on the other hand* bears a complex relation 
to language as we have seen and has so far defeated the best efforts of en- 
gineers to build a device that will perceive it. Yet this complex code is 
mastered by children at an early age, some significant proficiency being pres- 
ent at four weeks; it requires no tuition; and everyone who can hear manages 
to perceive speech quite well. 
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The relevance of all this to the psychology of memory is an obvious and 
generally observed caution: namely, that we be careful about explaining 

hews in terms of processes and concepts that , work well in intelligent and 

remember in^machines . We nevertheless make the point becanse ve have n c,, 

a telling object lesson. The speech code is an extremely complex contrlvanc , 
apparently designed to make the best of a bad fit bctween the requirem n tta 
phonetic segments be transmitted at a rapid rate and the inability of th 
and the ear^o meet that requirement in any simple may Yet the Physiological 
devices that correct this mismatch are so much a part of our being that- 
speech works more easily and naturally for human beings than any other arrange 
ment, including those that are clearly simpler. 

Mm-P. and less encoded elements of speech . In describing the c J*™<;ter- 
istics of the speech code we several times pointed to differences between 
stop consonants P and vowels. The basic difference has 

duction mo their perception requires a decoding process; vowels can be, »d 
sometimes are, represented by encipherment, as It were ™d ttapleJ way. 
the speech signal, so they might be perceived in a t heir tendencies 

We are not surprised, then, that stops and vowels d fer . . ht . 

toward categorical perception as they do also in the magnitude of the right- 

ear advantage and the lag effect (see above). 

An implication of this characteristic of the speech code for 'search 
in immediate memory has appeared In a study by Crowder (in Ptess) Crowder 
sue vests that vowels produce a "recency effect, but stops d • 

and^Mor ton (1969) had found that, if a list of spoken words Is presented to 
a subiect there is an improvement in recall for the last few items on the 
ij Vmt*no such recency effect is found if the list is presented visua y. 

To explain this model difference, Crowder and Morton suggested that the spoken 
lleZ'lA "hew for several seconds in an "echoic" -^ist- in ^preca egorica 
or raw sensory form. At the time of recall these items are still available 
the subject to all their original sensory richness and are ther ®^®®M ^ re 

remembered. When presented visually, the items are Crowder has 

for only a fraction of a second. In his more recent experiment Crowder nas 

found that for lists of stop-vowel syllables, the auditory recency ® e< - 

appears if the syllables on the list contrast only ^ their vowel .s but t is 
absent if they contrast only in their stops. If Crowder and Morton s inte 
oretation of their 1969 result is correct, at least in general terms, then 
the difference in recency effect between stops and vowels ia ®^ss tha^de-® 
should expect. As we have seen in this paper, t e spec a p imme- 

codes the stops strips away all auditory information and presents to imme 
diate perception a categorical linguistic event the listener can be awa *® 
of only as [b.d.g.p.t, of k] . Thus there is for these segmen^ no auditory, 
precategorical form that is available to consciousness for a time long eno g 
to produce a recency effect. The relatively unencoded vowels, on the ©Oie 
Sand! are capable of being perceived in a different way. Per f ^ 

nearly continuous than categorical: the listener can make relati ^^^^ 

discriminations within phonetic classes because the auditory 
of the signal can be preserved for a while. (For a relevant model and sup 
porting data see Fujisaki and Kawashima, 1969.) In the experiment y ro e , 
we may 8 suppose that these same auditory characteristics of the vowel, held 
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for several seconds in an echoic sensory register, , pr< ovide object with 

the rich, precstegorical information that enables him to recall the most 

recently presented items with relative ease. 

It is characteristic of the speech code, and indeed of language in 

(e.g., the vowels); they require special processing and can be exp 
behave in different ways when memory codes are used. 

as a special process . Much of what we “irL'norSllfpro- 6 

was to show that it is complex in a spec a e xami ne the formal aspects 

:rthL b LL CO "e e : P e 0 e n «sSl 8 aSre^ a o f d :Si^s SS.’VSfSh. grammatical 

codes of phonology and^syntax--which is^to say that^speech^is an^^n egra^^ 

cessor works, so we cannot compare it very y , task it 

r t °r o appear^t^be^ifferent^n^mportant'ways from the tests that confront 

speech processing may be different from 0 e t , . information may be 

different from th“ MMTS m^oryluch as. for example, visual 
or spatial . 

Sneech appears to be specialized, not only by comparison with other 
perceptual^or^cognitive systems of the human being, but ^ comparison 
with any of the systems so far found in ether sn^als. ^^“^nd 
be some question about just how ««ny ®f the s °-called ^ th at 

linguistic processes monkeys are capable of, it s y c0 de is 

th^ speech code is unique to man. To the extent, then, this c ™? e 

used in memory processes— for example, in short-term memory 
careful about generalizing results across species. 

Speech and Memory Codes Comp ared 

It will be recalled that we began by adopting the view that paraphrase 
has »“e « do with the processes by which we remember than with those^by 

w l^g s f ^hef of sLIiUe lavage, they normally use the devices of 
grammar io recode the information from the form in »“chi was transmitted 
into a form suitable for storage. On the occasion of retail they code it 

back into another transmittable £ ““ *22? ■ part“f no^i^ory^ ^pLcesses 
meaning. Thus, grammar becomes an essential Pact^^ ^ tbeEefore directed 

“r attention^to^grammatical codes, taking these to be the rules by which 
conversions are carried out from one linguiatic level to^ano P 

thos^ofbgrammar generally. *t 



j i, l q i. 4 f* viQQ been more sccfissibic 

speech has the advantage in this gra l ati cal codes. As a result, 
to psychological investigation t characterize speech in ways that 

“Sufris 

codM and eo 3 

comparison between them and the speech code. 

, . _ i,uj q h - f emission of conventional 

We will apply the same convention to t ^ di grammatical codes . That 

memory codes that we applie to ou ^ which convert from one repre- 

is , the term "code” is reserved fox the rules w of the speech code 

sentation of the information ^ another. ^ representations and in- 

jgrred^th^propert ies^f fh^P^code from the relation between the two. 

In the most familiar type of “P“^®^ e nts ^'language, such as 
required to remember are not the . 8 wor ds 0 r nonsense syllables. 

sentences or discourses, u sublect is required to reproduce the 

Typically in such an experiment, th ® ® abj .. j his response is counted 

information exactly as “^“Sces it is difficult, if 

as an error if he does not. u , v his linguistic coding devices to 

not impossible, for the sl J b ^^ most wa y. However, it is quite 

their fullest extent, or in situat ion nevertheless uses codes; moreover, 

evident that the subject in to w hich, we have argued, language 

he uses them for the same general P U *P° S ® 1 t gtore the information in a form 
is so often put, which is to enable t t the tas k of remembering 

different from that in which a S??r “graphs , the subject may employ, 

unfamiliar sequences such “ ^ some form of linguistic mediation 

sometimes to the experimentet 8 chagri ^ converts the consonant se- 

(Montague, Adams, and Kiess, 1966) Th ^ ’ h then stores along with a rule 
quence into a sentence or proposition , _ which * recent examination of how 
for future recovery of the consonan (1971) con cluded that such med- 

people remember nonsense syllables, Pr y^^ ^ Rev i ewing the literature on 
iation is the rule rather than the exce f^^an (1970) describe two kinds of 
memory for verbal materials, u v ^g anf ^ alterna tive symbol for the in P^ 
conversions: one is the subst ^^ , . the other is the storage of ancillary 

stimulus together with a c01 J ve remem bered item. Most generally, it ^PP ea ^ s 
information along with the to-be-remmbered ite unrelated words, 

that when a subject is required to remember ^tl^l^ pafctern t0 the mate r- 

paired-associates, or digit strings s, relationships. Or he resorts, 

Sal, to restructure it in terms of fami^ that Miller (1956) 

at least in some situations, to the * f memory theory (Mandler, !967). 

first described and that has become a “^(Paivio, 1969; Bower, 1970). 

Or he converts the verbal items pointed out, bare-bones rote 

At all events, we find that, as Bower (1970 * V ^ 
memorization is tried only as a last resor 

The subject converts to-be-remembered ^terial^which ^ q ^^ at 0 f V ert a l 

relatively meaningless into sa id about the rules relating the 

items or Images for storage. What can between the two levels 

s^rsi. sr assn.* -■ - —■ " 
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i 9 tvio differences would appear to be greater than the 
language in general? The differenc 1 . cited are more properly 

similarities. Many of these conversions that £ have used 

described as simple ciphers t an as c cases no restructuring of the 

these terms earlier, since ther , gu b s titution of one represen- 

information but only a rather s ra g this type are arbitrary and 

ration for another. Mo * e °^’ b ”*®° e y the two forms of the information having 
idiosyncratic, the connect , , t f a life history? such rules 

arisen often out of the accident s of the ^ject^s^life 

as there may be (for example, to c . rationalize the code but 

to a word beginning with that letter) d not truly book . As 

rather fall bach, in the end on a e t « . effect.^ „ qulre 

often as not, the memory codes are also reiac y tQ be difficult 

rdra:di^ rt ™d°t C o a :ffici:ncy, ^ n ^^jt h rie:fhe a hig:^ r e«icient 



In memory experiments which permit the kind of c ™^“^ 8 b e h ^uch t like 

by paraphrase, we would f!2Jd expect them to have characteristics similar to 
language codes, and we should P conversions would be complex recod- 

those of the code wa know as speech. Jhe c onv belng rationall2ed; 

ings, not simple substitutions, hi, efficient for the uses to which they 
and they would, of course, b ® hlgh . y . . f . . , heir most obvious characteristic 
were being put. But we would P^i^^^j^oontrive mnemonic aids 
to be that of naturalness. People ordinarily contr ^ ^ chey nec _ 

by which to remember the gist of conversati n the like, yet they 

essarily devise elaborate schemes for recallingjtorle^and the^ ^ 

are reasonably adept at such thing * ^ .. d not have to be taught 

to commit a message to memory; more important, they do not nav 

how to do this sort of remembering. 



controls and measures are hard to arrange. e _ y manv aen tences and imply 
that inevitably occur in long discourses wiU span many abouc ? h L. 

recoding processes so complex that weta^y k^^ ^ haye described are simply 

Yet, if the arbitrary, idiosyn materials into a form amenable to 

devices to mold “-^Tt^beTie? ‘"Understanding of such 
the natural codes, then it mus 8 - , ^ |.t. e natural bases from 

ciphers will advance more surely with know g , anchored, 

which they derive and to which they must, presumably, 
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